Decomposer for parallel turbo decoding, process and integrated circuit

ABSTRACT

A decoder for access data stored in n memories comprises a function matrix containing addresses of the memory locations at unique coordinates. A decomposer sorts addresses from coordinate locations of first and second m×n matrices, such that each row contains no more than one address from the same memory. Positional apparatus stores entries in third and fourth m×n matrices identifying coordinates of addresses in the function matrix such that each entry in the third matrix is at coordinates that matches corresponding coordinates in the first matrix, and each entry in the fourth matrix is at coordinates that matches corresponding coordinates in the second matrix. The decoder is responsive to entries in the matrices for accessing data in parallel from the memories.

FIELD OF THE INVENTION

[0001] This invention relates to parallel data processing, andparticularly to integrated circuits that perform parallel turbodecoding.

BACKGROUND OF THE INVENTION

[0002] Data processing systems using convolutional codes aretheoretically capable of reaching the Shannon limit, a theoretical limitof signal-to-noise for error-free communications. Prior to the discoveryof turbo codes in 1993, convolutional codes were decoded with Viterbidecoders. However, as error correction requirements increased, thecomplexity of Viterbi decoders exponentially increased. Consequently, apractical limit on systems employing Viterbi decodes to decodeconvolutional codes was about 3 to 6 dB from the Shannon limit. Theintroduction of turbo codes allowed the design of practical decoderscapable of achieving a performance about 0.7 dB from the Shannon limit,surpassing the performance of convolutional-encoder/Viterbi-decoders ofsimilar complexity. Therefore, turbo codes offered significant advantageover prior code techniques.

[0003] Convolutional codes are generated by interleaving data. There aretwo types of turbo code systems: ones that use parallel concatenatedconvolutional codes, and ones that use serially concatenatedconvolutional codes. Data processing systems that employ parallelconcatenated convolutional codes decode the codes in several stages. Ina first stage, the original data (e.g. sequence of symbols) areprocessed, and in a second stage the data obtained by permuting theoriginal sequence of symbols is processed, usually using the sameprocess as in the first stage. The data are processed in parallel,requiring that the data be stored in several memories and accessed inparallel for the respective stage. However, parallel processing oftencauses conflicts. More particularly, two or more elements or sets ofdata that are required to be accessed in a given cycle may be in thesame memory, and therefore not accessible in parallel. Consequently, theproblem becomes one of organizing access to the data so that allrequired data can simultaneously accessed in each of the processingstages.

[0004] Traditionally, turbo decoding applications increased throughputby adding additional parallel turbo decoders. However, in integratedcircuit (IC) designs, the additional decoders were embodied on the ICand necessarily increased chip area dramatically. There is a need for aturbo decoder that achieves high throughput without duplication ofparallel turbo decoders, thereby achieving reduced IC chip area.

SUMMARY OF THE INVENTION

[0005] The present invention is directed to a decomposer for turbodecoders, which makes possible parallel access to direct and interleavedinformation. When implemented in an IC chip, the decomposer eliminatesthe need for turbo decoder duplications, thereby significantly reducingchip area over prior decoders.

[0006] In one form of the invention, a process is provided to accessdata stored at addressable locations in n memories. A function matrix isprovided having coordinates containing addresses of the addressablelocations in the memories. A set of addresses from first and secondmatrices, each having m rows and n columns, is sorted into uniquecoordinate locations such that each row contains no more than oneaddress of a location from each respective memory. Third and fourthmatrices are created, each having m rows and n columns. The third andfourth matrices contain entries identifying coordinates of addresses inthe function matrix such that each entry in the third matrix is atcoordinates that matches corresponding coordinates in the first matrixand each entry in the fourth matrix is at coordinates that matchescorresponding coordinates in the second matrix. Data are accessed inparallel from the memories using the matrices.

[0007] In some embodiments, the addresses are organized into first andsecond sets, S_(r) ^(q), each containing the addresses. The sets aresorted into the first and second matrices. More particularly, for eachset, a plurality of edges between the addresses are identified such thateach edge contains two addresses, and each address is unconnected or innot more than two edges. The edges are linked into a sequence, and arealternately assigned to the first and second sets.

[0008] In some embodiments, each set, S_(r) ^(q), of addresses isiteratively divided into first and second subsets S_(r+1) ^(2q) andS_(r+1) ^(2q+1) which are placed into respective rows of the respectivefirst and second matrices, until each row contains no more than oneaddress of a location in each respective memory.

[0009] In other embodiments, a decomposer is provided to decomposeinterleaved convolutional codes. The decomposer includes the first,second, third and fourth matrices.

[0010] In yet other embodiments, an integrated circuit includes adecoder and a decomposer including the first, second, third and fourthmatrices.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011]FIG. 1 is a flowchart of a process of partitioning data intomemories in accordance with an aspect of the present invention.

[0012] FIGS. 2-5 are illustrations useful in explaining the process ofFIG. 1.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0013] The present invention is directed to a decomposer for turbo codedecoding, which eliminates the need for turbo decoder duplications.

[0014] The premise of the present invention can be generalized byconsidering two arbitrary permutations of a set of numbers, whichrepresents addresses in n memories where data for processing are stored.Assume that each memory is capable of storing a maximal number, m, ofwords. The addresses can be represented in two tables (matrices), onefor each processing stage. Each table has m rows and n columns, and eachrow represents addresses to be accessed simultaneously during a givenclock cycle. Each column represents the addresses in one memory.

[0015] In accordance with the present invention, the addresses arepartitioned into groups such that each row in each of the two tablesdoes not contain more than one address from the same group. Then, storeddata from the same group of addresses in one memory allow simultaneousaccess to all addresses from any row and any table through access todifferent memories.

[0016] The algorithm to partition addresses uses input integer numbers mand n, and two m×n matrices, T₁ and T₂, which represent two differentpermutations of a set of numbers S={0,1,2, . . . , n*m−1}. The numbersof set S represent addresses in the respective memory. The process ofthe present invention determines a function whose input set is in theform of {0,1,2, . . . , n*m−1} and provides an output set {0,1,2, . . ., 2^(k)−1}, where 2^(k−1)<n≦2^(k), f:{0,1,2, . . . , n*m−1}→f:{0,1,2, .. . , 2^(k)−1}, such that for every i, j₁, j₂ the relationship f(Tα[i][j₁])!=f(Tα[i][j₂]) is satisfied, where α=1,2. The resultingpartitioning gives 2^(k) subsets of S, one for each function value, suchthat set S is represented as S=S₀∪S₁∪S₂ . . . ∪S₂ _(^(k)) ⁻¹.

[0017] The output of the algorithm is a set of matrices, T₁ and T₂,which provides the addresses of the memories (numbers from 0 to 2^(k)−1)and the local addresses of all data required to be accessedsimultaneously within the memories for a processing stage.

[0018] Set S is partitioned in k stages. An intermediate stage isdenoted by r, where 0≦r<k. At each stage, set S_(r) ^(q) is divided intotwo subsets S_(r+1) ^(2q) and S_(r+1) ^(2q+1), where q is an indexsymbolically denoting the original set, q, divided into two new sets, 2qand 2q+1. Starting with r=0,q=1, the initial set, S=S_(r) ^(q), isdivided into two subsets S_(r+1) ^(2q) and S_(r+1) ^(2q+1). At the nextstage, sets S_(r+1) ^(2q) and S_(r+1) ^(2q+1) are each divided to twodescendants, S_(r+1) ^(2q)=S_(r+1) ^(2(2q))∪S_(r+2) ^(2(2q+1)) andS_(r+1) ^(2q+1)=S_(r+2) ^(2(2q+1))∪S_(r+2) ^(2(2q+1)+1). Thepartitioning iterates until r=k, at which point the number of elementsin each row is either 0 or 1. For example, for the initial set wherer=0, S=S₀ ^(q), is divided into two subsets S₁ ^(2q) and S₁ ^(2q+1);sets S₁ ^(2q) and S₁ ^(2q+1) are each divided to two descendants, S₁^(2q)=S₂ ^(2(2q))∪S₂ ^(2(2q+1)) and S₁ ^(2q+1)=S₂ ^(2(2q+1))∪S₂^(2(2q+1)+1).

[0019] The number of elements in each intermediate set is one of the twointegers closest to m*n*2^(−r) if it is not already an integer so thatboth intermediate sets has m*n*2^(−r) points. For each intermediate setin the process, the number of set elements in a single row, m, ofmatrices T₁ and T₂ is less than or equal to n*2^(−r).

[0020] At the end point (where r=k), the number of elements from eachset S₂ _(^(k)) ⁻¹ ^(q) in each row of matrices T₁ and T₂ is equal 0 or1, meaning that function f is determined (the indexes of subsets S₂_(^(k)) ⁻¹ ^(q) are values of f) and there is no need for furtherpartitioning. Thus, there is no row, m, in either matrix T₁ and T₂,which contains more than one element from the same subset. Hence, allnumbers in a row have different function values.

[0021] The process of the partitioning algorithm is illustrated inFIG. 1. The process commences at step 100 with the input of the number nof memories and the size m of each memory. The value of r is initializedat 0. At step 102, k is calculated from the relationship2^(k−1)<n≦2^(k). S_(r) ^(q) is generated at step 104. Thus, at the firstiteration, S₀ ^(q) is generated. If, at step 106, r is smaller than k,then at step 108 S_(r) ^(q) is divided as S_(r) ^(q)=S_(r+1)^(2q)∪S_(r+1) ^(2q+1). At step 110, the value of r is incremented by oneand the process loops back to step 104 to operate on the recursions S₁^(2q) and S₁ ^(2q+1). Assuming r is still smaller at k at step 106, forthe second iteration where r=1, S₁ ^(2q) is divided as S₁ ^(2q)=S₂^(2(2q))∪S₂ ^(2(2q)+1) and S₁ ^(2q+1) is divided as S₁ ^(2q+1)=S₂^(2(2q+1))∪S₂ ^(2(q+1)+1). The process continues until r is equal to kat step 106. As long as r<k, the number of S_(r) ^(q) elements(addresses) resulting from each iteration of division in one row of T₁and T₂ may be more than one. When r=k, each division result contains oneor no S_(r) ^(q) elements in a row of T₁ and T₂. The process ends atstep 112, and the set S is partitioned into 2^(k) subsets.

[0022] Consider a set S_(r) ^(q)={18,11,27,4,10,16,20,14,2} representingmemory elements (addresses) at some partitioning stage. The object is topartition S_(r) ^(q) into subsets such that upon completion of the finalstage there are no two elements from the same set in the same row oftables T₁ and T₂ (FIG. 2). FIG. 3 illustrates the process ofpartitioning, which includes a first step 120 that constructs two setsof edges, one set per table. The second step 122 links the constructededges into lists, which are then used in the final step 124 to producetwo subsets S_(r+1) ^(2α) and S_(r+1) ^(2q+1) for each table.

[0023] At step 120, the edges are constructed by connecting two adjacentpoints in each row. As used herein, the term “point” refers tocorresponding numbers in the input set. If the row contains an oddnumber of points, the remaining point is connected with next remainingpoint from the next row that also has odd number of elements. If, afterall rows are processed, there is still a point without a pair, thatpoint is left unconnected. For the example of FIG. 2, the two edge setsare

[0024] E₁={(18,11), (27,4), (10,16), (20,14)} and

[0025] E₂={(27,16), (20,4), (10,2), (14,18)}.

[0026] Points 2 in T₁ and 11 in T₂ are unconnected.

[0027] At step 122, the edges and points identified in step 120 arelinked into lists. Each list starts at a point and ends at the same ordifferent point. This step starts at any point from the set beingdivided, and looks alternately in tables T₁ and T₂ for list elements.For purposes of illustration, assume the starting point is point 18 andtable T₁ in FIG. 2. Edge (18,11) is the first in the list. Next, a point(if it exists) is found in table T₂ that is connected to the end of edge(18,11). In this case point 11 is not connected to any other point intable T₂, so point 18, from the start of the edge is considered. In thiscase, table T₂ identifies that point 14 is connected in an edge withpoint 18. Because the edge (14,18) found in table T₂ is connected to thefirst point (18) of edge (18,11), the direction of movement through thelist is reversed and edge (14,18) is added to the trailing end. Next theprocess looks for a point in table T₁ connected to the end (point 14) oflist in the direction of movement. Because point 14 is edged with point20 in table T₁, point 20 is the next point of the list. The processcontinues until the second end of the list (point 2) is reached. If, atthe end of the list, all points from the set S_(r) ^(q) are included inthe linking, the linking operation is finished. If there are points thatdo not belong to any list, a new list is started. In the example of FIG.2, all points are in one list. There may be any number of lists andthere may be none or one “isolated” (unconnected) point.

[0028] After completing the linkages of step 122, the points areidentified as odd or even, starting from any point. The starting pointand all points separated by an odd number of points from the startingpoint (all even points) are inserted into S_(r+1) ^(2q). All otherpoints (all odd points) are inserted into S_(r+1) ^(2q+1). For example,the points can be indexed with 0 and 1 so that neighboring points havedifferent indices. Thus, all points with a “0” index are inserted intoone set (S_(r+1) ^(2q)) and all points with a “1” index are in the otherset (S_(r+1) ^(2q+1)). In the example of FIG. 2, starting indexing atpoint 11, the result of this dividing are sets: S_(r+1)^(2q)={11,14,4,16,2} and S_(r+1) ^(2q+1)={18,20,27,10}. Sets S_(r+1)^(2q) and S_(r+1) ^(2q+1) are further partitioned until k=r and no rowcontains more than one element from the original set, S_(r) ^(q).

[0029] The outputs of the process are function f matrix and two“positional” matrices, P₁ and P₂, that identify the position of elementsin starting tables (matrices) T₁ and T₂. The four matrices P₁, P₂, T₁and T₂ allow necessary parallelism in data reading. Function f isrepresented in the form of a matrix whose column indices are its valuesand column elements are numbers from the input set which have thatvalue. Thus, in FIG. 5 each column of matrix f contains addresses fromone memory. The positional matrices P₁ and P₂ have the same dimensionsas matrices T₁ and T2, namely m×n. For each position (i,j) in a matrixT₁ or T₂, the corresponding position in the corresponding matrix P₁ orP₂ identifies a position of the corresponding element, T₁[i][j] orT₂[i][j], in matrix f. For example, in FIG. 5 element T₁[2][1]=5 inmatrix T₁ identifies a position (i,j) in positional matrix P₁ of elementP₁[2][1]. Element P₁[2][1] identifies the row and column coordinates(1,5) of element T₁[2][1]=5 in matrix f. In matrix T₂, elementT₂[5][4]=5 identifies positional element P₂[5][4] which identifies thecoordinates (1,5) in matrix f of T₂[5][4]=5. Similarly, in matrix T₂,element T₂[2][1] identifies the (i,j) position in positional matrix P₂,which in turn identifies the row and column coordinates (4,7) of elementT₂[2][1]=15 in matrix f.

[0030] Decoding turbo codes is performed using the T1 and T2 matrices,together with the P1 and P2 positional matrices, by accessing one of theT1 or T2 matrices during each parallel processing stage, and, using thecorresponding positional matrix P1 or P2, to identify the address in thefunction matrix, where each column of the function matrix represents adifferent memory in the system of memories. For example, if a paralleloperation required data from the third row of matrix T1 (addresses 21,5, 1, 19, 34), matrix T1 would identify coordinates (2,0), (2,1), (2,2),(2,3) and (2,4), pointing to corresponding coordinates in matrix P1where coordinates (1,3), (1,5), (1,6), (1,1) and (1,2) are stored. Theseare the coordinates of required addresses in function matrix f and eachis placed in different columns (memories).

[0031] Although the present invention has been described with referenceto preferred embodiments, workers skilled in the art will recognize thatchanges may be made in form and detail without departing from the spiritand scope of the invention.

What is claimed is:
 1. A process of accessing data stored at addressablelocations in n memories comprising steps of: a) providing a functionmatrix having unique coordinates containing addresses of the addressablelocations in the memories; b) sorting a set of addresses from first andsecond matrices, each having m rows and n columns, into uniquecoordinate locations such that each row contains no more than oneaddress of a location in each respective memory; c) creating third andfourth matrices, each having m rows and n columns, the third and fourthmatrices containing entries identifying coordinates of addresses in thefunction matrix and arranged so that each entry in the third matrix isat coordinates that match coordinates in the first matrix containing thecorresponding address, and each entry in the fourth matrix is atcoordinates that match coordinates in the second matrix containing thecorresponding address; and d) accessing data in parallel from thememories using the function, first, second, third and fourth matrices.2. The process of claim 1, wherein the function matrix containsaddresses from the first and second matrices arranged for parallelaccess.
 3. The process of claim 1, wherein step b) includes steps of:b1) organizing the addresses into first and second sets, S_(r) ^(q),each containing the addresses, and b2) sorting the first set ofaddresses into the first matrix and sorting the second set of addressesinto the second matrix.
 4. The process of claim 3, wherein step b1)includes steps of: i) identifying a plurality of edges between theaddresses such that each edge contains two addresses, and each addressis unconnected or in not more than two edges, ii) linking the edges intoa sequence, and iii) alternately assigning edges to first and secondsets.
 5. The process of claim 4, wherein step b2) includes, for each ofthe first and second sets of addresses, steps of: i) dividing the set,S_(r) ^(q), of addresses into first and second subsets S_(r+1) ^(2q) andS_(r+1) ^(2q+1), ii) placing the first and second subsets intorespective rows of the respective first and second matrix, and iii)iteratively repeating steps i) and ii) until each row contains no morethan one address of a location in each respective memory.
 6. The processof claim 3, wherein step b2) includes, for each of the first and secondsets of addresses, steps of: i) dividing the set, S_(r) ^(q), ofaddresses into first and second subsets S_(r+1) ^(2q) and S_(r+1)^(2q+1), ii) placing the first and second subsets into respective rowsof the respective first and second matrix, and iii) iterativelyrepeating steps i) and ii) until each row contains no more than oneaddress of a location in each respective memory.
 7. A decomposer fordecomposing a set of interleaved convolutional codes, the set of codesbeing arranged at coordinates in a function matrix, the decomposercomprising: first and second matrices, each having m rows and n columnsdefining coordinates, each of the first and second matrices containingthe codes at coordinates such that each row contains no more than onecode of a respective group; and third and fourth matrices, each having mrows and n columns, and containing entries identifying coordinates ofaddresses in the function matrix and arranged so that each entry in thethird matrix is at coordinates that match coordinates in the firstmatrix containing the corresponding address, and each entry in thefourth matrix is at coordinates that match coordinates in the secondmatrix containing the corresponding address.
 8. The decomposer of claim7, wherein the set of codes in the function matrix represents a functionf:{0,1,2,3, . . . , n*m−1)→f:{0,1,2,3, . . . , 2^(k)−1}.
 9. Thedecomposer of claim 7, further including: an organizer for organizingthe addresses into first and second sets, S_(r) ^(q), each containingthe addresses, and a sorter for sorting the first set of addresses intothe first matrix and sorting the second set of addresses into the secondmatrix.
 10. The integrated circuit of claim 9, wherein the organizerincludes: an edge identifier for identifying a plurality of edgesbetween the addresses such that each edge contains two addresses, andeach address is unconnected or in not more than two edges, a linker forlinking the edges into a sequence, and an assignor for alternatelyassigning edges to first and second sets.
 11. The integrated circuit ofclaim 10, wherein the sorter includes, for each set: a divider fordividing each set, S_(r) ^(q), of addresses into first and secondsubsets S_(r+1) ^(2q) and S_(r+1) ^(2q+1), placer apparatus for placingthe first and second subsets into respective rows of the respectivefirst and second matrix, and iteration apparatus for iterativelyrepeating operation of the divider and placer until each row contains nomore than one address of a location in each respective memory.
 12. Theintegrated circuit of claim 9, wherein the sorter includes, for eachset: a divider for dividing each set, S_(r) ^(q), of addresses intofirst and second subsets S_(r+1) ^(2q) and S_(r+1) ^(2q+1), placerapparatus for placing the first and second subsets into respective rowsof the respective first and second matrix, and iteration apparatus foriteratively repeating operation of the divider and placer until each rowcontains no more than one address of a location in each respectivememory.
 13. An integrated circuit containing a decoder for accessingdata stored at addressable locations in n memories, the decodercomprising: a function matrix having unique coordinates containingaddresses of the addressable locations in the memories; a decomposerincluding: first and second matrices, each having m rows and n columns,sorting apparatus for sorting a set of addresses to coordinate locationsin the first and second matrices such that each row of the first andsecond matrices contains no more than one address of a location in eachrespective memory, and third and fourth matrices, each having m rows andn columns, and containing entries identifying coordinates of addressesin the function matrix and arranged so that each entry in the thirdmatrix is at coordinates that match coordinates in the first matrixcontaining the corresponding address, and each entry in the fourthmatrix is at coordinates that match coordinates in the second matrixcontaining the corresponding address; and decoder apparatus responsiveto entries in the function, first, second, third and fourth matrices foraccessing data in parallel from the memories.
 14. The integrated circuitof claim 13, wherein the function matrix contains addresses from thefirst and second matrices arranged for parallel access.
 15. Theintegrated circuit of claim 13, wherein the decomposer includes: anorganizer for organizing the addresses into first and second sets, S_(r)^(q), each containing the addresses, and a sorter for sorting the firstset of addresses into the first matrix and sorting the second set ofaddresses into the second matrix.
 16. The integrated circuit of claim15, wherein the organizer includes: an edge identifier for identifying aplurality of edges between the addresses such that each edge containstwo addresses, and each address is unconnected or in not more than twoedges, a linker for linking the edges into a sequence, and an assignorfor alternately assigning edges to first and second sets.
 17. Theintegrated circuit of claim 16, wherein the sorter includes, for eachset: a divider for dividing each set, S_(r) ^(q), of addresses intofirst and second subsets S_(r+1) ^(2q) and S_(r+1) ^(2q+1), placerapparatus for placing the first and second subsets into respective rowsof the respective first and second matrix, and iteration apparatus foriteratively repeating operation of the divider and placer until each rowcontains no more than one address of a location in each respectivememory.
 18. The integrated circuit of claim 15, wherein the sorterincludes, for each set: a divider for dividing each set, S_(r) ^(q), ofaddresses into first and second subsets S_(r+1) ^(2q) and S_(r+1)^(2q+1), placer apparatus for placing the first and second subsets intorespective rows of the respective first and second matrix, and iterationapparatus for iteratively repeating operation of the divider and placeruntil each row contains no more than one address of a location in eachrespective memory.